Using Linear Predictors to Impute Allele Frequencies from Summary or Pooled Genotype Data.
نویسندگان
چکیده
Recently-developed genotype imputation methods are a powerful tool for detecting untyped genetic variants that affect disease susceptibility in genetic association studies. However, existing imputation methods require individual-level genotype data, whereas in practice it is often the case that only summary data are available. For example this may occur because, for reasons of privacy or politics, only summary data are made available to the research community at large; or because only summary data are collected, as in DNA pooling experiments. In this article, we introduce a new statistical method that can accurately infer the frequencies of untyped genetic variants in these settings, and indeed substantially improve frequency estimates at typed variants in pooling experiments where observations are noisy. Our approach, which predicts each allele frequency using a linear combination of observed frequencies, is statistically straight-forward, and related to a long history of the use of linear methods for estimating missing values (e.g. Kriging). The main statistical novelty is our approach to regularizing the covariance matrix estimates, and the resulting linear predictors, which is based on methods from population genetics. We find that, besides being both fast and flexible - allowing new problems to be tackled that cannot be handled by existing imputation approaches purpose-built for the genetic context - these linear methods are also very accurate. Indeed, imputation accuracy using this approach is similar to that obtained by state-of-the art imputation methods that use individual-level data, but at a fraction of the computational cost.
منابع مشابه
The Allele and Genotype Frequencies of Bovine Pituitaryspecific Transcription Factor and Leptin Genes in IranianCattle and Buffalo Populations Using PCR-RFLP
The use of polymorphic markers in breeding programmes could make selection more accurate and efficient. A total of 324 individuals from six Iranian cattle populations (Sarabi, Golpayegani, Sistani, Taleshi, Mazandarani, Dashtiyari), F1 Golpayegani × Brown Swiss and Iranian buffalo populations were genotypedfor the Pit-1 HinfI and leptin Sau3AI polymorphisms by the polymerase chain reactio...
متن کاملHardy Weinberg Equilibrium Testing and Interpretation: Focus on infection
Hardy-Weinberg equilibrium (HWE) holds when, in a closed population with random mating and without mutation and natural selection, genotype frequencies at any locus is a simple function of allele frequencies. Testing for HWE is now a common practice in population genetics and genetic association studies of non-communicable diseases; however, it is less-regarded, or sometimes miss-interpreted, i...
متن کاملO-8: Some Variations of the TSSK2 Gene May be Associated with Impaired Spermatogenesis
Background: Tssk2, a member of the testis specific serine/threonine kinase (TSSK) family, is expressed predominantly in the testis and crucial for the formation and function of the sperm cells in mouse. Targeted deletion of Tssk1 and 2 in male chimeric mice caused infertility due to haploinsufficiency of the genes. Therefore it is reasonable to postulate that mutations in its human homologue TS...
متن کاملNumerical analysis of intensity signals resulting from genotyping pooled DNA samples in beef cattle and broiler chicken.
Pooled genomic DNA has been proposed as a cost-effective approach in genomewide association studies (GWAS). However, algorithms for genotype calling of biallelic SNP are not adequate with pooled DNA samples because they assume the presence of 2 fluorescent signals, 1 for each allele, and operate under the expectation that at most 2 copies of the variant allele can be found for any given SNP and...
متن کاملAnalysis of TP53 Codon 72 Polymorphism in Mucinous and Non-Mucinous Colorectal Adenocarcinoma in Isfahan, Iran
Background: The tumor suppressor gene TP53 (alias p53) located on chromosome 17 is involved in various cancers. Case-control studies have shown that p53 codon 72 polymorphism modulates the prognosis and susceptibility to various malignancies. We undertook the present study to explore a possible association between mucinous and non-mucinous adenocarcinomas with different genotypes or alleles at ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- The annals of applied statistics
دوره 4 3 شماره
صفحات -
تاریخ انتشار 2010